128 research outputs found

    A Conditional Random Field for Multiple-Instance Learning

    Get PDF
    We present MI-CRF, a conditional random field (CRF) model for multiple instance learning (MIL). MI-CRF models bags as nodes in a CRF with instances as their states. It combines discriminative unary instance classifiers and pairwise dissimilarity measures. We show that both forces improve the classification performance. Unlike other approaches, MI-CRF considers all bags jointly during training as well as during testing. This makes it possible to classify test bags in an imputation setup. The parameters of MI-CRF are learned using constraint generation. Furthermore, we show that MI-CRF can incorporate previous MIL algorithms to improve on their results. MI-CRF obtains competitive results on five standard MIL datasets. 1

    LoANs: Weakly Supervised Object Detection with Localizer Assessor Networks

    Full text link
    Recently, deep neural networks have achieved remarkable performance on the task of object detection and recognition. The reason for this success is mainly grounded in the availability of large scale, fully annotated datasets, but the creation of such a dataset is a complicated and costly task. In this paper, we propose a novel method for weakly supervised object detection that simplifies the process of gathering data for training an object detector. We train an ensemble of two models that work together in a student-teacher fashion. Our student (localizer) is a model that learns to localize an object, the teacher (assessor) assesses the quality of the localization and provides feedback to the student. The student uses this feedback to learn how to localize objects and is thus entirely supervised by the teacher, as we are using no labels for training the localizer. In our experiments, we show that our model is very robust to noise and reaches competitive performance compared to a state-of-the-art fully supervised approach. We also show the simplicity of creating a new dataset, based on a few videos (e.g. downloaded from YouTube) and artificially generated data.Comment: To appear in AMV18. Code, datasets and models available at https://github.com/Bartzi/loan

    Overview of the 2005 cross-language image retrieval track (ImageCLEF)

    Get PDF
    The purpose of this paper is to outline efforts from the 2005 CLEF crosslanguage image retrieval campaign (ImageCLEF). The aim of this CLEF track is to explore the use of both text and content-based retrieval methods for cross-language image retrieval. Four tasks were offered in the ImageCLEF track: a ad-hoc retrieval from an historic photographic collection, ad-hoc retrieval from a medical collection, an automatic image annotation task, and a user-centered (interactive) evaluation task that is explained in the iCLEF summary. 24 research groups from a variety of backgrounds and nationalities (14 countries) participated in ImageCLEF. In this paper we describe the ImageCLEF tasks, submissions from participating groups and summarise the main fndings

    Global and efficient self-similarity for object classification and detection

    Get PDF
    Self-similarity is an attractive image property which has recently found its way into object recognition in the form o

    Visual and semantic similarity in ImageNet

    Get PDF
    Many computer vision approaches take for granted positive answers to questions such as “Are semantic categories visually separable? ” and “Is visual similarity correlated to semantic similarity?”. In this paper, we study experimentally whether these assumptions hold and show parallels to questions investigated in cognitive science about the human visual system. The insights gained from our analysis enable building a novel distance function between images assessing whether they are from the same basic-level category. This function goes beyond direct visual distance as it also exploits semantic similarity measured through ImageNet. We demonstrate experimentally that it outperforms purely visual distances. 1

    Informed perspectives on human annotation using neural signals

    Get PDF
    In this work we explore how neurophysiological correlates related to attention and perception can be used to better understand the image-annotation task. We explore the nature of the highly variable labelling data often seen across annotators. Our results indicate potential issues with regard to ‘how well’ a person manually annotates images and variability across annotators. We propose such issues arise in part as a result of subjectively interpretable instructions that may fail to elicit similar labelling behaviours and decision thresholds across participants. We find instances where an individual’s annotations differ from a group consensus, even though their EEG (Electroencephalography) signals indicate in fact they were likely in consensus with the group. We offer a new perspective on how EEG can be incorporated in an annotation task to reveal information not readily captured using manual annotations alone. As crowd-sourcing resources become more readily available for annotation tasks one can reconsider the quality of such annotations. Furthermore, with the availability of consumer EEG hardware, we speculate that we are approaching a point where it may be feasible to better harness an annotators time and decisions by examining neural responses as part of the process. In this regard, we examine strategies to deal with inter-annotator sources of noise and correlation that can be used to understand the relationship between annotators at a neural level

    Measuring the Objectness of Image Windows

    Get PDF

    Geometry Constrained Weakly Supervised Object Localization

    Get PDF
    We propose a geometry constrained network, termed GC-Net, for weakly supervised object localization (WSOL). GC-Net consists of three modules: a detector, a generator and a classifier. The detector predicts the object location defined by a set of coefficients describing a geometric shape (i.e. ellipse or rectangle), which is geometrically constrained by the mask produced by the generator. The classifier takes the resulting masked images as input and performs two complementary classification tasks for the object and background. To make the mask more compact and more complete, we propose a novel multi-task loss function that takes into account area of the geometric shape, the categorical cross-entropy and the negative entropy. In contrast to previous approaches, GC-Net is trained end-to-end and predict object location without any post-processing (e.g. thresholding) that may require additional tuning. Extensive experiments on the CUB-200-2011 and ILSVRC2012 datasets show that GC-Net outperforms state-of-the-art methods by a large margin. Our source code is available at https://github.com/lwzeng/GC-Net.Comment: This paper (ID 5424) is accepted to ECCV 202

    Image Co-localization by Mimicking a Good Detector's Confidence Score Distribution

    Full text link
    Given a set of images containing objects from the same category, the task of image co-localization is to identify and localize each instance. This paper shows that this problem can be solved by a simple but intriguing idea, that is, a common object detector can be learnt by making its detection confidence scores distributed like those of a strongly supervised detector. More specifically, we observe that given a set of object proposals extracted from an image that contains the object of interest, an accurate strongly supervised object detector should give high scores to only a small minority of proposals, and low scores to most of them. Thus, we devise an entropy-based objective function to enforce the above property when learning the common object detector. Once the detector is learnt, we resort to a segmentation approach to refine the localization. We show that despite its simplicity, our approach outperforms state-of-the-art methods.Comment: Accepted to Proc. European Conf. Computer Vision 201

    Latent Log-Linear Models for Handwritten Digit Classification

    Full text link
    corecore